Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: draft initial implementation of Realtime API #10127

Draft
wants to merge 25 commits into
base: main
Choose a base branch
from

Conversation

eavanvalkenburg
Copy link
Member

Motivation and Context

Implements the OpenAI Realtime API with Semantic Kernel

Description

Implements a separate Service Client class with its own ExecutionSettings, but still based on ChatCompletionClientBase.
Only support streaming operations with additional public methods for sending data to the conversation.
TBD if that is the way to move forward with it.

TODO:

  • lots of comments
  • tests
  • cleanup

Contribution Checklist

@eavanvalkenburg eavanvalkenburg requested a review from a team as a code owner January 8, 2025 16:04
@eavanvalkenburg eavanvalkenburg marked this pull request as draft January 8, 2025 16:04
@markwallace-microsoft markwallace-microsoft added the python Pull requests for the Python Semantic Kernel label Jan 8, 2025
@markwallace-microsoft
Copy link
Member

markwallace-microsoft commented Jan 9, 2025

Python Test Coverage

Python Test Coverage Report •
FileStmtsMissCoverMissing
semantic_kernel/connectors/ai
   chat_completion_client_base.py124298%395, 405
   function_calling_utils.py521081%161–186
   realtime_client_base.py22482%102, 109–110, 114
semantic_kernel/connectors/ai/open_ai/services
   open_ai_realtime.py371462%35–37, 84–98, 127, 147
semantic_kernel/connectors/ai/open_ai/services/realtime
   open_ai_realtime_base.py18113028%79–117, 127–163, 170–203, 211–384, 393–397, 403, 412, 416, 420
   open_ai_realtime_webrtc.py1228332%61–63, 66–74, 84–129, 134–141, 144–171, 179–188, 192–211
   open_ai_realtime_websocket.py582950%46–64, 67–73, 83–86, 91–94
   utils.py403220%38–44, 54, 69–126
semantic_kernel/contents
   audio_content.py25292%81, 86
   binary_content.py1151686%81, 120, 138–139, 180–184, 192–198
   function_call_content.py107397%197, 225–226
semantic_kernel/contents/utils
   data_uri.py101496%44–45, 68, 133
TOTAL17721218288% 

Python Unit Test Overview

Tests Skipped Failures Errors Time
3049 4 💤 0 ❌ 0 🔥 1m 20s ⏱️

docs/decisions/00XX-realtime-api-clients.md Outdated Show resolved Hide resolved
docs/decisions/00XX-realtime-api-clients.md Outdated Show resolved Hide resolved
docs/decisions/00XX-realtime-api-clients.md Outdated Show resolved Hide resolved
docs/decisions/00XX-realtime-api-clients.md Outdated Show resolved Hide resolved

# Content and Events

## Considered Options - Content and Events
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we call out whether the “control” versus “content” distinction is a fundamental part of real-time interaction or just an implementation detail? For example, OpenAI distinguishes control events (input_audio_buffer.committed) from content events (conversation.item.create), while Google appears to treat everything as part of a unified content stream (BidiGenerateContent*).

This distinction might influence our decision in a few ways:

  • If the distinction is inherent to real-time systems, separating control from content may result in a cleaner, more flexible design.
  • However, if it’s just a specific quirk of OpenAI’s API, enforcing it could complicate support for providers like Google that don’t make the same distinction.
  • On the other hand, ignoring OpenAI’s finer-grained controls might limit the ability to fully utilize other features in the future.

I think it would make sense to call this out explicitly in the doc and could provide additional context for why we’re choosing one approach over the other.

@eavanvalkenburg eavanvalkenburg mentioned this pull request Jan 31, 2025
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python Pull requests for the Python Semantic Kernel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants